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The relationship between the design and functionality of molecular networks is now a key issue 
in biology. Comparison of regulatory networks performing similar tasks can give insights into how 
network architecture is constrained by the functions it directs. We here discuss methods of network 
comparison based on network architecture and signaling logic. Introducing local and global signaling 
scores for the difference between two networks we quantify similarities between evolutionary closely 
and distantly related bacteriophages. Despite the large evolutionary separation between phage A and 
186 their networks are found to be similar when difference is measured in terms of global signaling. 
We finally discuss how network alignment can be used to to pinpoint protein similarities viewed 
from the network perspective. 
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SYNOPSIS 

Networks of interacting genes and proteins orchestrate 
the complex functions of every living cell. Decoding the 
logic of these biochemical circuits is a central challenge 
facing biology today. Trusina et al. describe a mathemat- 
ical method for aligning two regulatory networks based 
on their signaling properties, and apply it to a case study 
of three bacteriophages, simple biological "computers" 
whose genetics are exceptionally well characterized. The 
comparison reveals a surprising similarity between reg- 
ulatory networks of the creatures, even when they have 
very distant evolutionary relationships. The method in- 
troduced here should be applicable to other networks, 
and thus help illuminate the computational substructures 
of living systems. 



INTRODUCTION 

The functioning of living organisms is based on an 
intricate network of genes and proteins regulating each 
other. Various organisms differ due not only to differ- 
ences in the constituting components (genes/proteins), 
but also because of the organization of these regulatory 
networks. It is therefore important to address similar- 
ities and differences not only in protein sequences but 
also in the interaction patterns of the proteins. Thus, 
large scale analysis of protein-protein and protein-DNA 
interactions have provided insight into the local design 
features of subcellular signaling Q, 0, Q ; network align- 
ment based on sequence similarities permits alignment of 
related motifs [j, • 

Here we suggest to compare networks through an align- 
ment method that is based solely on network architecture 
and signaling logic, and thus does not rely on sequence 



similarity of the involved proteins. 

As a study case we consider the regulatory networks 
of two very well-characterized temperate bacteriophages 
of E. coli, A and 186 (Fig. 1). These two phages repre- 
sent two distinct classes of temperate bacteriophages: the 
lambdoid phages - which include A, P22, 434, HK97 and 
HK022, and the P2 group - which includes P2, 186, HP1, 
K139 and PSP3. A and 186, are not detectably related in 
sequence and have different genome organizations. Using 
tBLASTx 6] to compare all of the reading frames, there 
are only two clearly homologous protein pairs - the A 
endolysin R (P03706)/186 (PO80309) (E-score = 10" 34 ) 
and a pair of early lytic proteins of unknown function 
(E-score = 2xl0~ 4 ). No significant similarity was de- 
tectable at the nucleotide level (using BLASTn, Q). On 
the genome level, the arrangement of gen es, promoters 
and operators is very different 0, H, 0, El • As a control 
of methodology, we also consider the P22 phage, which 
as a member of the lambdoid family allows us to compare 
topologies of evolutionarily related networks. 

As temperate phage, both 186 and lambda can be in 
two states: a lytic state where many proteins are active 
in the replication of the phage DNA and the construction 
and release of virus particles; and a lysogenic state where 
the phage genome is integrated into the bacterial chromo- 
some and only a few proteins are active. For both phages, 
three core proteins (CI (P03034), Cro (P03040) and CII 
(P03042) in A, and CI (P08707), Apl (P21681) and CII 
(P21678) in 186) do the main computations, with the 
switch into lysogeny being coordinated by CII and the 
reverse switch into the lytic mode initiated by activation 
of the host SOS response protein RecA (P03017). The 
gene regulatory networks of all temperate phage have 
evolved to provide lysogenic and lytic states, and more 
than that, to switch from one state to another when par- 
ticular signals have been received from bacterial proteins, 
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Bacteriophage P22 




FIG. 1: The genetic regulatory networks for phage 186 for 
phage A and P22 all of which are temperate and infect Es- 
cherichia coli. The proteins are colored according to their 
functions and expression mode in the lysis-lysogeny life cycle 
of the phages. We summarize the influence of one protein 
on another by either a green (positive, e.g., transcriptional 
activation) or a red (negative, e.g., repression) arrow. The 
dashed lines show relatively weak regulations. Database en- 
try for A genome is J02459, for 186 genome - U32222, and for 
P22 genome it is NC.002371. 



and thus effectively perform the same function. 

Given that 186 and A are both temperate, i.e. per- 
forming similar function, but are evolutionary separated, 
we asked whether we can detect structural similarities 
and what is the scale at which these similarities are de- 
tectable ? 



RESULTS 

Visual comparison of the 186 and A networks (Fig. 1), 
suggests both strong similarities but also major differ- 
ences. One way to quantify the similarity of two net- 
works is by edit distance |ll| . Assume that we know 
which nodes (here, proteins) in network A and B should 
be paired. For networks of the same size, we define edit 
distance as the number of insertions or removals of edges 
(regulatory connections) one has to perform on network 
A to obtain B. This is quantified through 

D E {A,B) = Y i \Ai j -B ij \, (1) 

The elements A+j and Hy specify whether the direct reg- 
ulation of i on protein j is positive, negative or absent 
and are constructed such that each element can keep both 
positive and negative links (for details see eq. (2) below). 

In case we do not know which nodes in networks A and 
B should be paired, we find the optimal identification by 
minimizing De as described in Materials and Methods 
section. This yields the minimal distance between the 
networks, as well as an optimal alignment of the individ- 
ual nodes. This distance we call the edit difference. 

The minimal edit difference between related phages is 
small De(X, P22) = 18, compared to the larger scores for 
evolutionary separated phages, see table [I] The De — 18 
means that, the A network of 62 proteins and 144 con- 
nections can be constructed by making 18 edits of the 
connections in a 62 protein subset of the 67 protein P22 
network (adding/removing a link is a single edit, chang- 
ing the sign of a connections needs two edits). To get 
an idea of the significance of the obtained De values, 
we compare with optimal alignments of 500 randomized 
versions of the two networks. The randomization pro- 
cedure was designed to conserve the local properties of 
the networks in order to try to keep their general biolog- 
ical features. Firstly, the core-hub topology common in 
biological networks |2j was maintained by conserving for 
each protein the number of its regulators (inputs) and the 
number of proteins regulated by it (outputs). Secondly, 
the number of each sign (positive and negative) of the 
input and output connections was kept for each node. 

The constrain of preserving the local properties does 
not fix the network completely: while keeping the num- 
ber of positively/negatively regulated proteins one can 
still change which exactly of them are being regulated. 
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FIG. 2: Illustration of the differences between the real 186 
and A networks (top) and an example of their randomized 
counterparts (bottom). These examples of randomized net- 
works show that it possible to preserve local properties, yet 
obtain different network structures. 



The structure of the resulting random networks is rather 
different, as seen in the examples shown in Figure [2 

Overall we find that De scores between any pair of ran- 
domized networks are similar. When comparing scores 
between real network, with that of their random coun- 
terparts in table [J one see no clear trend. In particu- 
lar the differences between these randomized versions, 
X r and 186 r were indistinguishable from that of the real 
networks: D E (l86 r , X r ) = 32 ± 2. 

We reasoned that the functional similarity of networks 
might be better reflected in a less local measure of func- 
tionality. We therefore introduce a signaling difference 
Ds, which aims at capturing both direct (as in De) and 
also indirect regulation through a sequence of intermedi- 
ate proteins. For each pair of proteins we consider 
whether i sends a signal to j, and if so whether the sig- 
nal along the shortest path is positive or negative. In 
this spirit we define the sign of a signal as the product 
of the signs of all links on the shortest path from i to 
j. An example where this procedure nicely reflects the 
functionality in terms of its "Boolean" logic is found 
in the pathway from RecA to CI in the two phages . In 
A, active RecA directly catalyzes self cleavage of CI [l2| : 
whereas in 186, RecA acts through the degradation of 
LexA (P03033), that in turn represses the protein Turn 
(P41063), which in the absence of repression binds CI 
and prevents it from performing its function. Thus the 
simple -1 signal in A is in 186 replaced by a signaling 
consisting of (—1) x (—1) x (—1) = —1. In other words 
repressing a repressor is effectively an activation. 

Because the regulation of one protein by another may 
be positive through one series of links and negative 
through another, two matrices were used for each net- 
work, one for positive signals (A s+ and B s+ ) and one 
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33 
32 ± 2 


0.27 


43 
109 ± 33 


0.01 


A, P22 
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18 

33 ± 4 


0.00 
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255 ± 55 


0.00 


P22, 186 
P22 r , 186 r 


25 
31 ± 1 


0.00 


97 
161 ± 36 


0.03 



TABLE I: The overall difference measures De, Ds between 
the networks, with respective P-scores as defined in text. 

for negative signals (A s ~ and B s ~). If the effect of pro- 
tein i on protein j is only positive, then 1 is placed into 
and into Af~ . If the effect is only negative, then 
is placed into Ay + and 1 into A^ . If there are posi- 
tive and negative signals along paths of equal length (e.g. 
from RecA to A CII via LexA or CI) , then 1 is placed into 
both matrices. Observe that when positive and negative 
signals come to the same node, they are not canceling 
each other. This is intended, as often signals will arrive 
at different times or at different conditions [Tij . 

The signaling difference between two networks A and 
B is then defined as 

D S (A,B) = l4 + -4 + l + K r ~ B ff\ ( 2 ) 

ij 

which takes into account differences in both positive and 
negative signaling along the shortest paths between any 
pair of nodes. Like De, the minimum difference Ds is 
calculated by optimizing which proteins in 186 should 
be identified with which proteins in A, and in addition 
which A proteins should be excluded. Excluding a pro- 
tein means that the signaling to and from that protein is 
not counted in Ds, whereas signaling across the excluded 
protein is included. 

Optimizing protein alignment based on signaling, we 
find Ds(186,A) = 43. Again, the significance of this 
difference was determined by repeatedly performing ran- 
domization of the networks as described above, creat- 
ing the A s+ and A s ~ matrices and obtaining the min- 
imal Dg. The differences between random networks, 
Z?s(186 r , X r ) = 109 ± 33, is much larger than between 
the real networks. This is further quantified by a P-score, 
P(Ds > Ds(random)) = 0.01, defined as the probability 
that two randomized networks will have a smaller differ- 
ence than that between the real networks. 

Thus all three networks are similar in their signaling 
pattern. To confirm that this signaling similarity is 
not generally conserved among biological networks, we 
have compared the phage networks with other networks 
that perform different functions (e.g. the S. cerevisiae, 
[l7j . cell cycle network and the B. subtilis competence 
network, [18j). We found that Ds is much larger and the 
P-scores are close to 1 in these alignments, indicating 
that the low signaling difference between the phage 
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networks is a special property of these functionally 
similar networks. 

We have also considered other variants of the dif- 
ference measures, in particular including all non- 
repetitive paths between pairs of proteins, with all 
paths weighted equally. In that case we also find that 
Ds-all{^, 186) = 390 between real network is smaller 
than Ds-aii(^r, 186 r ) = 583 ± 122 between the ran- 
domized counterparts. Also, using the shortest paths, 
we have investigated differences between networks where 
weak links (the dashed ones in Fig. 1) are weighted less 
(by a factor 0.5 or removed altogether). Ds scores be- 
tween networks got smaller, but overall significance re- 
mained similar. 



DISCUSSION 

The pathway related D$ score allowed us to identify 
significant similarity between two very distantly related 
biological networks , see table [I] In contrast, the edit 
difference measure, which looks only at the local wiring 
structure, is sometimes blind to this more global "homol- 
ogy" . Thus although edit difference partially captures 
network similarities through a patchwork of local match- 
ings, it is less sensitive to pathway disruptions. 

It is not clear whether the functional similarity be- 
tween the lambda and 186 networks detected by the D$ 
measure is a result of convergent evolution or is a rem- 
nant of a shared ancestral network. Under either sce- 
nario it is clear that the two network structures must be 
strongly constrained by functional requirements, given 
the evolutionary separation of the two phages. A po- 
tential bias should be noted here: knowledge of the three 
phage networks is not complete, even for A, and it is thus 
possible that some of the observed similarity in the net- 
works is due to knowledge of connections in one phage 
network having influenced the discovery of connections 
in the others. 

The Ds alignment allows us to address the role of vari- 
ous proteins in pathway disruptions. Figure line up the 
A and 186 proteins on the basis of pre-existing knowledge 
of their function or mode of expression and have indi- 
cated the optimal Ds alignment and the contribution of 
each pair to the signaling difference. The two alignments 
show good matches for late lytic genes as well as for the 
regulators CI, CII and B from 186 aligned with CI, CII 
and Q in A. Thus in general functions of proteins in one 
network teaches us about protein properties in the other 
network. The lack of a good match between Apl (in 186) 
and Cro (in A), is due to the weak links from Cro, and 
reflects a different functional role of Cro and Apl in the 
late lytic development of phages. Insisting on alignment 
of Cro with Apl results in Ds = 219, thus emphasizing 
the particular role of Cro as a repressor of late lysis in A. 
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FIG. 3: Alignment of two phage networks. Placement of pro- 
teins is based on our knowledge 0, S U E ll3Tl and the lines 
connecting them are associated to the minimal Ds alignment. 
Proteins that perform similar functions or are regulated sim- 
ilarly are placed on the same level; thus horizontal lines mark 
ideal matching. Blue lines correspond to meaningful align- 
ments, red lines are the misalignments. The numbers above 
the lines, di, reflect the differences in signaling between the 
aligned proteins and are the contributions to the minimal dif- 
ference Ds = \ Yli di =43. The numbers in the parentheses 
indicate multiple equivalent proteins, making the sum of all 
shown signaling differences equal to 2 • 43. The key regula- 
tors RecA, LexA and CI are identified correctly whereas the 
misidentification of CII with CIII is reasonable since both 
favor entry into lysogeny through the same pathway. The 
major discrepancy is associated to different roles of Cro and 
Apl during lysis (the weak links from Cro to Q and N in A). 

Comparison of molecular networks is becoming an im- 
portant element of modern systems biology, both with 
regards to predictions of eventual missing links 0] , and 
for increasing our understanding of functionality of infor- 
mation processing in the networks. The here presented 
alignment methods address the similarities on a local, 
respectively larger scale, associated to signaling across 
networks. 

In this regards we found that evolutionary relation- 
ships (A — P22) imply similar local regulation, with low 
De score. For all temperate phages, evolved to do sim- 
ilar " computation" , their regulatory networks are found 
to be similar when viewed from a more global perspec- 
tive where both direct and indirect signals are included 
(low Ds score compared to random expectation). Thus 
the mechanistic and structural differences on the scale 
of genome and promoter organization disappear when 
considering the large scale of the protein regulatory net- 
works. Going beyond immediate regulations allows to 
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capture functional similarity in the most robust way. 

MATERIALS AND METHODS 

The present papers is based on the data on three bac- 
teriophages A (accession no. J02459), P22 (NC_002371) 
and 186(U32222). The regulatory networks were com- 
piled from these database entries and various literature 
sources: A ([j, Il3l I20L Elf and references therein), for 
186 (dUEll and references therein), for P22 ([lj and 
references therein). 

In the Results section we define two differences scores, 
De and D$ between a pair of networks A and B. Pro- 
vided that we know which proteins in A should be identi- 
fied with which in B, the scores are calculated as in Eq. 1 
respectively Eq. 2. In case we do not know which nodes in 
networks A and B should be paired, we need to find the 
optimal identification of nodes between them. To do so, 
we define an alignment procedure through the Metropo- 
lis algorithm [22j designed to reach the minimal distance 
D between the networks: Given two nodes and their cor- 
responding partners in the other network the elementary 
step is to switch partners and reevaluate the distance. 
Iterating this procedure and using simulated annealing 
[23| the method converges to a global minimum. 

If the two networks are of different size we count only 
the contribution from a number of nodes given by the 
smaller of the two networks. In the larger network these 
nodes are selected to minimize the distance using the 
above algorithm. 

We would like to note that the above method is not 
intended to reflect any evolutionary process, but is used 
to find the optimal mapping of pairs of proteins that 
look similar from the network perspective. The method 
is limited by the network size, and in practice works for 
networks below 200 nodes. 

The realization of the alignment algorithm 
in form of the Java applet is available at 
http: / /www. cmol.nbi.dk/models/compar /compar.html 
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